Taming Subgraph Isomorphism for RDF Query Processing

نویسندگان

  • Jinha Kim
  • Hyungyu Shin
  • Wook-Shin Han
  • Sungpack Hong
  • Hassan Chafi
چکیده

RDF data are used to model knowledge in various areas such as life sciences, Semantic Web, bioinformatics, and social graphs. The size of real RDF data reaches billions of triples. This calls for a framework for efficiently processing RDF data. The core function of processing RDF data is subgraph pattern matching. There have been two completely different directions for supporting efficient subgraph pattern matching. One direction is to develop specialized RDF query processing engines exploiting the properties of RDF data for the last decade, while the other direction is to develop efficient subgraph isomorphism algorithms for general, labeled graphs for over 30 years. Although both directions have a similar goal (i.e., finding subgraphs in data graphs for a given query graph), they have been independently researched without clear reason. We argue that a subgraph isomorphism algorithm can be easily modified to handle the graph homomorphism, which is the RDF pattern matching semantics, by just removing the injectivity constraint. In this paper, based on the state-of-the-art subgraph isomorphism algorithm, we propose an in-memory solution, TurboHOM++, which is tamed for the RDF processing, and we compare it with the representative RDF processing engines for several RDF benchmarks in a server machine where billions of triples can be loaded in memory. In order to speed up TurboHOM++, we also provide a simple yet effective transformation and a series of optimization techniques. Extensive experiments using several RDF benchmarks show that TurboHOM++ consistently and significantly outperforms the representative RDF engines. Specifically, TurboHOM++ outperforms its competitors by up to five orders of magnitude.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

gStore: Answering SPARQL Queries via Subgraph Matching

Due to the increasing use of RDF data, efficient processing of SPARQL queries over RDF datasets has become an important issue. However, existing solutions suffer from two limitations: 1) they cannot answer SPARQL queries with wildcards in a scalable manner; and 2) they cannot handle frequent updates in RDF repositories efficiently. Thus, most of them have to reprocess the dataset from scratch. ...

متن کامل

BitMat: An In-core RDF Graph Store for Join Query Processing

With the growing size of RDF data sources, the need for a compact representation providing efficient query interface has become compelling. In this paper, we introduce BitMat, a main memory based compressed bit-matrix structure. The key aspects of BitMat are as follows: i) its RDF graph representation is very compact compared to the conventional disk-based and existing main-memory RDF stores, a...

متن کامل

Taming verification hardness: an efficient algorithm for testing subgraph isomorphism

Graphs are widely used to model complicated data semantics in many applications. In this paper, we aim to develop efficient techniques to retrieve graphs, containing a given query graph, from a large set of graphs. Considering the problem of testing subgraph isomorphism is generally NP-hard, most of the existing techniques are based on the framework of filtering-and-verification to reduce the p...

متن کامل

Well Behaved RDF: A Straw-Man Proposal for Taming Blank Nodes

The RDF language (Resource Description Framework) allows nodes in an RDF graph to be unlabeled – “blank nodes”. While blank nodes and certain other features are convenient for RDF authors, their unrestricted use causes complications to RDF consumers, such as when attempting to compare RDF graphs, which in the general case is as difficult as the graph isomorphism problem. This paper proposes a s...

متن کامل

Querying RDF Data Using A Multigraph-based Approach

RDF is a standard for the conceptual description of knowledge, and SPARQL is the query language conceived to query RDF data. The RDF data is cherished and exploited by various domains such as life sciences, Semantic Web, social network, etc. Further, its integration at Web-scale compels RDF management engines to deal with complex queries in terms of both size and structure. In this paper, we pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2015